Recap of Relational Data

STAT 331

Relational Data

Mutating joins

Adds information from a new dataframe to observations in an existing dataframe

Filtering joins

Filters observations based on values in new dataframe

Keys

  • Uniquely identifies an observation in a dataset

  • Relate datasets to each other

surveys

record_id month day year plot_id species_id sex hindfoot_length weight
1 7 16 1977 2 NL M 32 NA
2 7 16 1977 3 NL M 33 NA


plots

plot_id plot_type
1 Spectab exclosure
2 Control

species

species_id genus species taxa
AB Amphispiza bilineata Bird
AH Ammospermophilus harrisi Rodent

Mean Weights

surveys_weight <- surveys |>
  drop_na(weight) |> 
  group_by(species_id) |> 
  summarize(mean_weight = mean(weight))
species_id mean_weight
BA 8.60000
DM 43.15786
DO 48.87052
DS 120.13055
NL 159.24566
OL 31.57526

Filtering joins

Keeping Observations

semi_join()

surveys_weight |> 
  semi_join(species, by = "species_id")
species_id mean_weight
BA 8.60000
DM 43.15786
DO 48.87052
DS 120.13055
NL 159.24566

Filtering joins

Removing Observations

anti_join()

species |> 
  anti_join(surveys_weight, by = "species_id")
species_id species
AB bilineata
AH harrisi
AS savannarum

Connecting to Data Cleaning

Including observations with %in%

surveys |> 
  filter(species_id %in% c("BA", "DM", "DS")
         )


Similar to semi_join()!

Connecting to Data Cleaning

Excluding observations with !%in%

surveys |> 
  filter(!species_id %in% c("AB", "AH", "AS")
         )

Similar to anti_join()!

Tidy Data

Pivoting Longer